NSF PAR Search | NSF Public Access Repository

Enforcing Signal Temporal Logic Specifications in Multi-Agent Adversarial Environments: A Deep Q-Learning Approach

https://doi.org/10.1109/CDC.2018.8618746

Muniraj, D; Vamvoudakis, K. G; Farhood, M. (January 2019, 2018 IEEE Conference on Decision and Control (CDC))

This work addresses the problem of learning optimal control policies for a multi-agent system in an adversarial environment. Specifically, we focus on multi-agent systems where the mission objectives are expressed as signal temporal logic (STL) specifications. The agents are classified as either defensive or adversarial. The defensive agents are maximizers, namely, they maximize an objective function that enforces the STL specification; the adversarial agents, on the other hand, are minimizers. The interaction among the agents is modeled as a finite-state team stochastic game with an unknown transition probability function. The synthesis objective is to determine optimal control policies for the defensive agents that implement the STL specification against the best responses of the adversarial agents. A multi-agent deep Q-learning algorithm, which is an extension of the minimax Q-learning algorithm, is then proposed to learn the optimal policies. The effectiveness of the proposed approach is illustrated through a simulation case study.

Full Text Available

Search for: All records